IN THE CLAIMS : 

Please cancel Claims 5, 64 and 72 without prejudice or disclaimer of subject 
matter. Please amend Claims 1 to 4 and 6 to 35, and add Claims 73 and 74 as shown 
below. The claims, as pending in the subject application, read as follows: 

1 . (Currently Amended) Apparatus to identify for identifying topics of 
document data, the apparatus comprising: 

a word ranker configured ope r able to rank words that are present in or 
representative of the content of the document data; 

a co-occurrence ranker configured ope r able to rank co-occurrences of words 
that are present in or representative of the content of the document data; 

a phrase ranker configured operabl e to rank phrases in the document data; 

a words selector configured ope r able to select the highest ranking words 
with a highest ranking ; 

a co-occurrence identifier configured operable to identify which of the 
highest ranking co-occurrences with a highest ranking contain at least one of the highest 
ranking words; 

a phrase identifier configured ope r able to identify the phrases containing at 
least one word from the identified co-occurrences by concatenating consecutive nouns, 
concatenating consecutive proper nouns, and concatenating consecutive adjectives with a 
final noun : 



.J . f) 

a phrase selector configured operabl e to select the highest ranking one or 
ones of the identified phrases with a highest ranking as the topic or topics of the document 
data; and 

an outputter configured op e rable to output data relating to the selected 

topics. 

2. (Currently Amended) Apparatus according to The apparatus of 
claim 1, wherein the words selector is configured arranged to select as the highest ranking 
words a predetermined number of the highest ranking words, a number of the highest 
ranking words that represents a predetermined percentage of the words in the document 
data, or a number of the highest ranking words that represents a predetermined percentage 
of the number of ranked words. 

3. (Currently Amended) Apparatus acco r ding to The apparatus 
of claim 1, wherein the co-occurrence identifier is configured arranged to select as the 
highest ranking co-occurrences a predetermined number of co-occurrences, a number of the 
highest ranking co-occurrences that represents a predetermined percentage of the 
co-occurrences in the document data, or a number of the highest ranking co-occurrences 
that represents a predetermined percentage of the number of ranked co-occurrences. 

4. (Currently Amended) Apparatus according to The apparatus of 
claim 1, wherein the phrase selector is configured arranged to select as the highest ranking 
identified phrases a predetermined number of the identified phrases, a number of the 
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highest ranking identified phrases that represents a predetermined percentage of the 
identified phrases in the document data, or a number of the highest ranking identified 
phrases that represents a predetermined percentage of the number of ranked phrases. 

5. (Cancelled) 

6. (Currently Amended) Apparatus acco r ding to The apparatus of 
claim 1, wherein at least one of the word ranker, co-occurrence ranker, and phrase ranker is 
configured arranged to weight the items to be ranked in accordance with their position in 
the document data. 

7. (Currently Amended) Apparatus acco r ding to The apparatus of 
claim 1, further comprising a co-occurrence determiner configured operable to determine 
word co-occurrences in the document data by identifying a as co-occurrences,, word 
combinations comprising words in particular grammatical categories. 

8. (Currently Amended) Apparatus acco r ding to The apparatus of 
claim 1, further comprising a co-occurrence determiner configured ope r abl e to determine 
word co-occurrences in the document data by identifying as co-occurrences at least some of 
the following combinations: 

noun and verb; 
noun and noun; 
noun and proper noun; 



verb and proper noun; 

and proper noun and proper noun. 

9. (Currently Amended) Apparatus according to The apparatus of 
claim 7, wherein the co-occurrence determiner is configured arranged to ignore the order of 
the words in the word combinations. 

10. (Currently Amended) Apparatus acco r ding to The apparatus of 
claim 1, wherein the co-occurrence ranker is configured arranged to rank significant 
co-occurrences and the apparatus further comprises a co-occurrence determiner configured 
operable to determine word co-occurrences in the document data by identifying as 
co-occurrences word combinations comprising words in particular grammatical categories 
and a significance calculator configured op e rabl e to calculate a significance measure for 
the identified co-occurrences. 

11. (Currently Amended) Apparatus according to The apparatus of 
claim 1, wherein the co-occurrence ranker is configured a rr anged to rank significant 
co-occurrences and the apparatus further comprises a co-occurrence determiner configured 
ope r able to determine word co-occurrences in the document data by identifying as 
co-occurrences at least some of the following combinations: 

noun and verb; 
noun and noun; 
noun and proper noun; 
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verb and proper noun; and 

proper noun and proper noun, and a significance calculator configured 
operable to calculate a significance measure for the identified co-occurrences. 

12. (Currently Amended) Appa r atus acco r ding to The apparatus of 
claim 1, further comprising: 

a text splitter configured op e rabl e to split the document data into text 

segments; and 

a classifier configured operable to classify the selected topics according to 
of a the distribution in the text segments,, so-as to define main and subsidiary topics in the 
document data, wherein the outputter is configured arrang e d to output data relating to the 
classified topics. 

13. (Currently Amended) Apparatus according to claims 12 The 
apparatus of claim 16 , wherein the classifier is configured arranged to determine that a 
topic is a main topic when [[if]] the topic occurs in a predetermined percentage of the text 
segments and to classify any topic not meeting this requirement as a subsidiary or lesser 
topic. 

14. (Currently Amended) Apparatus according to claim. 12 The 
apparatus of claim 16 , wherein the classifier is configured a r rang e d to weight a topic in 
accordance with the a position in the document data of the text segment containing the 
topic. 
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15. (Currently Amended) Apparatus acco r ding to claim 12 The 
apparatus of claim 16 , wherein the classifier is configured arranged to weight a topic in 
accordance with the a position in the document data of the text segments containing the 
topic , wherein so that a topic occurring in at least one of the a first and last text segment of 
document data representing a document is given a higher weighting than topics occurring 
in the other text segments. 

16. (Currently Amended) Apparatus according to claim 12, further 
comprising Apparatus to identify topics of document data, the apparatus comprising: 

a word ranker configured to rank words that are present in or representative 
of the content of the document data; 

a co-occurrence ranker configured to rank co-occurrences of words that are 
present in or representative of the content of the document data: 

a phrase ranker configured to rank phrases in the document data; 

a words selector configured to select the highest ranking words: 

a co-occurrence identifier configured to identify which of the highest 
ranking co-occurrences contain at least one of the highest ranking words; 

a phrase identifier configured to identify the phrases containing at least one 
word from the identified co-occurrences; 

a phrase selector configured to select the highest ranking one or ones of the 
identified phrases as the topic or topics of the document data: 

an outputter configured to output data relating to the selected topics; 

a text splitter configured to split the document data into text segments; 
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a classifier configured to classify the selected topics of the distribution in 
the text segments which define main and subsidiary topics in the document data, wherein 
the outputter is configured to output data relating to the classified topics; and 

a topic hierarchy identifier configured operable to identify a topic as being a 
child or subsidiary topic of another topic when text portions in which that subsidiary topic 
occurs represent a sub-set of the text portions in which the said other topic occurs, wherein 
the outputter is configured arranged to output data relating to the identified topic hierarchy. 

17. (Currently Amended) Apparatus according to claim 12, further 
comprising Apparatus to identify topics of document data, the apparatus comprising: 

a word ranker configured to rank words that are present in or representative 
of the content of the document data: 

a co-occurrence ranker configured to rank co-occurrences of words that are 
present in or representative of the content of the document data; 

a phrase ranker configured to rank phrases in the document data; 

a words selector configured to select the highest ranking words; 

a co-occurrence identifier configured to identify which of the highest 
ranking co-occurrences contain at least one of the highest ranking words: 

a phrase identifier configured to identify the phrases containing at least one 
word from the identified co-occurrences: 

a phrase selector configured to select the highest ranking one or ones of the 
identified phrases as the topic or topics of the document data: 

an outputter configured to output data relating to the selected topics; 
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a text splitter configured to split the document data into text segments: 
a classifier configured to classify the selected topics of the distribution in 
the text segments which define main and subsidiary topics in the document data, wherein 
the outputter is configured to output data relating to the classified topics; and 

a topic hierarchy identifier configured operable to identify a topic as being a 
child or subsidiary topic of another topic when the text segments in which that subsidiary 
topic occurs represent a sub-set of the text segments in which the said other topic occurs, 
wherein the outputter is configured ar r anged to output data relating to the identified topic 
hierarchy. 

18. (Currently Amended) Apparatus according to The apparatus of 
claim 1, further comprising a summary provider configured operable to provide summary 
data on the basis of the selected topics, wherein the outputter is configured arranged to 
output the summary data. 

19. (Currently Amended) Apparatus acco r ding to The apparatus 
of claim 18, wherein the summary provider comprises a sentence selector configured 
op e rable to select sentences for use in the summary data. 

20. (Currently Amended) A p paratus acco r ding to claim 19 The 
apparatus of claim 22 . wherein the sentence selector comprises: 

a topic weight assigner configured operable to assign weights to the topics; 
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a sentence weight assigner configured operable to assign weights to 
sentences in the document data; 

a scorer configured operable to score the sentences by summing the 
assigned topic and sentence weights; and 

a selector configured op e rabl e to select the sentence or sentences having the 
highest score or scores for the summary. 

21. (Currently Amended) Apparatus according to The apparatus of 
claim 19, wherein the sentence selector comprises: 

a topic weight assigner configured operable to assign weights to the topics; 

a sentence weight assigner configured op e rabl e to assign weights to 
sentences in the document data; 

a scorer configured op er able to score the sentences by summing the 
assigned topic and sentence weights; 

a selector configured op er abl e to select the sentence or sentences having the 
highest score or scores; 

a topic weight adjuster configured op e rable to relatively reduce the weight 
allocated to the topic or topics in the selected sentence or sentences; and 

a controller configured ope r abl e to cause the scorer, selector and topic 
weight adjuster to repeat the above operations until a predetermined number of sentences 
has been selected for the summary from the document data. 



22. (Currently Amended) Apparatus according to claim 21. Apparatus 
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to identify topics of document data, the apparatus comprising: 

a word ranker configured to rank words that are present in or representative 
of content of the document data; 

a co-occurrence ranker configured to rank co-occurrences of words that are 
present in or representative of the content of the document data: 

a phrase ranker configured to rank phrases in the document data; 

a words selector configured to select the highest ranking words: 

a co-occurrence identifier configured to identify which of the highest 
ranking co-occurrences contain at least one of the highest ranking words: 

a phrase identifier configured to identify the phrases containing at least one 
word from the identified co-occurrences; 

a phrase selector configured to select the highest ranking one or ones of the 
identified phrases as the topic or topics of the document data; and 

a summary provider configured to provide summary data on the basis of the 
selected topics, wherein the summary provider comprises a sentence selector configured to 
select sentences to use in the summary data; 

wherein the sentence selector comprises: 

a topic weight assigner configured to assign weights to the topics: 
a sentence weight assigner configured to assign weights to sentences in the 
document data: 

a scorer configured to score the sentences by summing the assigned topic 
and sentence weights: 
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a selector configured to select the sentence or sentences having the highest 
score or scores; 

a topic weight adjuster configured to relatively reduce the weight allocated 
to the topic or topics in the selected sentence or sentences, wherein the topic weight 
adjuster is configured ar r anged to set to zero the weight of any topic in the selected 
sentence or sentences; 

a controller configured to cause the scorer, selector and topic weight 
adjuster to repeat the above operations until a predetermined number of sentences has been 
selected for the summary from the document data; and 

an outputter configured to output the summary data . 

23. (Currently Amended) Apparatus according to The apparatus of 
claim 19, further comprising: 

a chunk identifier configured ope r abl e to identify in sentences selected for a 
summary chunks that do not contain words in the selected topics; and 

a chunk modifier configured operable to modify the identified chunks. 

24. (Currently Amended) Apparatus according to claim 23 The 
apparatus of claim 26 , wherein the chunk modifier is configured a r rang e d to modify chunks 
by replacing them by ellipsis. 

25. (Currently Amended) Apparatus according to The apparatus of 
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claim 23, wherein the chunk modifier is configured arranged to modify chunks by causing 
them to be displayed^ so-as to place less emphasis on the modified chunks. 

26. (Currently Amended) Apparatus according to claim 25. Apparatus 
to identify topics of document data, the apparatus comprising: 

a word ranker configured to rank words that are present in or representative 
of the content of the document data; 

a co-occurrence ranker configured to rank co-occurrences of words that are 
present in or representative of the content of the document data; 

a phrase ranker configured to rank phrases in the document data; 

a words selector configured to select the highest ranking words; 

a co-occurrence identifier configured to identify which of the highest 
ranking co-occurrences contain at least one of the highest ranking words; 

a phrase identifier configured to identify the phrases containing at least one 
word from the identified co-occurrences; 

a phrase selector configured to select the highest ranking one or ones of the 
identified phrases as the topic or topics of the document data; 

a summary provider configured to provide summary data on the basis of the 
selected topics, wherein the summary provider comprises a sentence selector configured to 
select sentences to use in the summary data; 

a chunk identifier configured to identify in sentences selected for a summary 
chunks that do not contain words in the selected topics; and 
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a chunk modifier configured to modify the identified chunks wherein the 
chunk modifier is configured to modify chunks bv causing them to be displayed which 
place less emphasis on the modified chunks; and 

an outputter configured to output the summary data. 

wherein the outputter is configured to output the summary data and the 
chunk modifier is configured arranged to modify chunks to cause, when the outputter 
provides output data for display by a display, the modified chunks to be displayed using at 
least one of a smaller font size, a different font, a different font characteristic and a 
different font colour from the other chunks. 

27. (Currently Amended) Apparatus according to claim 23 The 
apparatus of claim 26 . wherein the chunk modifier is configured arranged to remove the 
identified chunks. 

28. (Currently Amended) Apparatus according to claim 27 The 
apparatus of claim 26 . further comprising a processor configured ope r able to carry out 
syntactic or semantic processing on sentences from which chunks have been removed to 
maintain sentence coherence or cohesion. 

29. (Currently Amended) Apparatus according to claim 23 The 
apparatus of claim 26 . wherein the chunk identifier is configured arranged to identify 
chunks by using punctuation marks to define the bounds of the chunks. 
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30. (Currently Amended) Apparatus according to claim 18 The 
apparatus of claim 22 . wherein the summary provider comprises a locater configured 
operable to locate words present in or representative of the content of the document data 
that co-occur with words in the topics; and 

the outputter is configured arranged to output summary data in which the or 
each topic is associated with subsidiary items comprising located co-occurring words. 

31. (Currently Amended) Apparatus according to claim 30 The 
apparatus of claim 22 . wherein the summary provider further comprises a further locater 
configured ope r able to locate all words present in or representative of the content of the 
document data that co-occur with the subsidiary items and the outputter is configured 
ar r anged to associate each such co-occurring word with the corresponding subsidiary item 
in the summary data. 

32. (Currently Amended) Appa r atus according to claim 3 1 The 
apparatus of claim 22 . wherein the summary provider further comprises a filter configured 
op er abl e to filter the co-occurring words to select for the summary data those co-occurring 
words that themselves have co-occurrences with the subsidiary items. 

33. (Currently Amended) Apparatus acco r ding to The apparatus of 
claim 1, further comprising a concept identifier configured operable to identify from the 
document data concepts that determine words representative of the content of the document 
data. 
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34. (Currently Amended) Appa r atus according to claim 33 The 
apparatus of claim 22 , wherein the concept identifier is configured arranged to identify as 
concepts at least one of synonyms, hypernyms and hypomyms hvponvms in or relating to 
the document data. 

35. (Currently Amended) Apparatus according to claim 33 The 
apparatus of claim 22 , wherein the concept identifier is configured a rr anged to access a 
lexical database to identify as concepts at least one of synonyms, hypernyms and 
hypomyms hvponvms in or relating to the document data. 

36. (Withdrawn) Co-occurrence significance calculating apparatus 
for use in text summarisation apparatus, the co-occurrence significance calculating 
apparatus comprising: 

a co-occurrence identifier operable to identify as co-occurrences particular 
combinations of categories of words present in or representative of the content of 
document data; 

a significance calculator operable to calculate a significance measure for the 
identified co-occurrences to determine significant ones of the identified co-occurrence; and 

an outputter operable to output data representing the determined significant 
co-occurrences. 

37. (Withdrawn) Apparatus according to claim 36, wherein the 
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co-occurrence identifier is arranged to identify as co-occurrences at least some of the 
following combinations: 

noun and verb; 

noun and noun; 

noun and proper noun; 

verb and proper noun; 

and proper noun, and proper noun, and the significance calculator is 
operable to calculate a significance measure for the identified co-occurrences. 

38. (Withdrawn) Apparatus according to claim 36, wherein the 
co-occurrence determiner is arranged to ignore the order of the words in the word 
combinations. 

39. (Withdrawn) Apparatus for searching document data, the apparatus 

comprising: 

a receiver operable to receive query terms supplied by a user; 

a co-occurrence determiner operable to identify, for each query term, 
co-occurrences of words present in or representative of the content of the document data 
that include the query terms; and 

an outputter operable to output parts or portions of the document data 
containing the identified co-occurrences. 

40. (Withdrawn) Apparatus according to claim 39, wherein the 
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co-occurrence determiner is arranged to identify as co-occurrences word combinations 
comprising words in particular grammatical categories. 

41 . (Withdrawn) Apparatus according to claim 39, wherein the 
co-occurrence determiner is arranged to identify as co-occurrences at least some of the 
following combinations: 

noun and verb; 

noun and noun; 

noun and proper noun; 

verb and proper noun; and 

proper noun and proper noun. 



42. (Withdrawn) Apparatus according to claim 39, wherein the 
co-occurrence determiner is arranged to ignore the order of the words in the word 
combinations. 



43. (Withdrawn) Apparatus for classifying topics in document data, 
which apparatus comprises: 

a text splitter operable to split the document data into text segments; 

a classifier operable to classify topics in the document data according to the 
distribution of the topics in the text segments so as to define main and subsidiary topics in 
the document data; and 

an outputter operable to output data representing the classified topics. 
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44. (Withdrawn) Apparatus according to claim 43, wherein the 
classifier is arranged to determine that a topic is a main topic if the topic occurs in a 
predetermined percentage of the text segments and to classify any topic not meeting this 
requirement as a subsidiary or lesser topic. 

45. (Withdrawn) Apparatus according to claim 43, wherein the 
classifier is arranged to weight a topic in accordance with the position in the document data 
of the text segment containing the topic. 

46. (Withdrawn) Apparatus according to claim 43, wherein the 
classifier is arranged to weight a topic in accordance with the position in the document data 
of the text segment containing the topic so that a topic occurring in at least one of the first 
and last text segments of document data representing a document is given a higher 
weighting than topics occurring in the other text segments. 

47. (Withdrawn) Apparatus for selecting sentences for use in a 
summary, the apparatus comprising: 

a topic weight assigner operable to assign weights to topics in document 
data to be summarised; 

a sentence weight assigner operable to assign weights to sentences in the 
document data; 

a scorer operable to score each sentence in the document data by summing 
the assigned weights; 
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a selector operable to select the sentence or sentences having the highest 

score; 

a topic weight adjuster operable to relatively reduce the weight allocated to 
topics in the selected sentence or sentences; and 

a controller operable to cause the scorer, selector and topic weight adjuster 
to repeat the above operations until a certain number of sentences has been selected for the 
summary from the document data. 

48. (Withdrawn) Apparatus according to claim 47, wherein the topic 
weight adjuster is arranged to set to zero the weight of any topic in the selected sentence or 
sentences. 



49. (Withdrawn) Apparatus for providing a summary of document data, 
which apparatus comprises: 

a receiver operable to receive data representing the topic or topics of the 
document data; 

a locator operable to locate, for words in the or each topic, words in or 
representative of the content of the document data that co-occur with those words; and 

an outputter operable to output summary data in which the or each topic is 
associated with subsidiary items comprising located co-occurring words. 

50. (Withdrawn) Apparatus according to claim 49, wherein the summary 
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provider further comprises a further locator operable to locate all words present in or 
representative of the content of the document data that co-occur with the subsidiary items 
and the outputter is arranged to associate each such co-occurring word with the 
corresponding subsidiary item in the summary data. 

5 1 . (Withdrawn) Apparatus according to claim 49, wherein the summary 
provider further comprises a filter operable to filter the co-occurring words to select for the 
summary data those co-occurring words that themselves have co-occurrences with the 
subsidiary items. 

52. (Withdrawn) Apparatus for modifying chunks of sentences selected , 
for a document data summary, which apparatus comprises: 

a chunk identifier operable to identify chunks that do not contain words in 

topics representative of the content of the document data; 

a chunk modifier operable to modify the identified chunks; and 

an outputter operable to output the document data summary with the 

identified chunks of the selected sentences modified by the chunk modifier. 

53. (Withdrawn) Apparatus according to claim 52, wherein the chunk 
modifier is arranged to modify chunks by replacing them by ellipsis. 

54. (Withdrawn) Apparatus according to claim 52, wherein the chunk 
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modifier is arranged to modify chunks by causing them to be displayed so as to place less 
emphasis on the modified chunks. 

55. (Withdrawn) Apparatus according to claim 52, wherein the chunk 
modifier is arranged to modify chunks to cause, when the outputter provides output data for 
display by a display, the modified chunks to be displayed using at least one of a smaller 
font size, a different font, a different font characteristic and a different font colour from the 
other chunks. 

56. (Withdrawn) Apparatus according to claim 52, wherein the chunk 
modifier is arranged to remove the identified chunks. 

57. (Withdrawn) Apparatus according to claim 56, further comprising a 
processor operable to carry out syntactic or semantic processing on sentences from which 
chunks have been removed to maintain sentence coherence or cohesion. 

58. (Withdrawn) Apparatus according to claim 52, wherein the chunk 
identifier is arranged to identify chunks by using punctuation marks to define the bounds of 
the chunks. 

59. (Withdrawn) Apparatus according to claim 52, further comprising 
a sentence selector operable to select the sentences for use in the summary data. 
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60. (Withdrawn) Apparatus according to claim 59, wherein the sentence 
selector comprises: 

a topic weight assigner operable to assign weights to the topics; 
a sentence weight assigner operable to assign weights to sentences in the 
document data; 

a scorer operable to score the sentences by summing the assigned topic and 
sentence weights; and 

a selector operable to select the sentence or sentences having the highest 
score or scores for the summary. 

61. (Withdrawn) Apparatus according to claim 52, wherein the sentence 
selector comprises: 

a topic weight assigner operable to assign weights to the topics; 
a sentence weight assigner operable to assign weights to sentences in the 
document data; 

a scorer operable to score the sentences by summing the assigned topic and 
sentence weights; 

a selector operable to select the sentence or sentences having the highest 
score or scores; 

a topic weight adjuster operable to reduce the weight allocated to the topic 
or topics in the selected sentence or sentences; and 
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a controller operable to cause the scorer, selector and topic weight adjuster 
to repeat the above operations until a predetermined number of sentences has been selected 
for the summary from the document data. 



62. (Withdrawn) A method of identifying topics of document data, the 
method comprising a processor carrying out the steps of: 

ranking words that are present in or representative of the content of the 
document data; 

ranking co-occurrences of words that are present in or representative of the 
content of the document data; 

ranking phrases in the document data; 
selecting the highest ranking words; 

identifying which of the highest ranking co-occurrences contain at least one 
of the highest ranking words; 

identifying the phrases containing at least one word from the identified 
co-occurrences; 

selecting the highest ranking one or ones of the identified phrases as the 
topic or topics of the document data; and 

outputting data relating to the selected topics. 

63. (Withdrawn) A method of calculating co-occurrence significances 
for use in text summarisation apparatus, the method comprising a processor carrying out 
the steps of: 
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identifying as co-occurrences particular combinations of categories of words 
present in or representative of the content of document data; 

calculating a significance measure for the identified co-occurrences to 
determine significant ones of the identified co-occurrence; and 

outputting data representing the determined significant co-occurrences. 

64. (Cancelled) 

65. (Withdrawn) A method of classifying topics in document data, 
which apparatus comprises a processor carrying out the steps of: 

splitting the document data into text segments; 

classifying topics in the document data according to the distribution of the 
topics in the text segments so as to define main and subsidiary topics in the document data; 
and 

outputting data representing the classified topics. 

66. (Withdrawn) A method of for selecting sentences for use in a 
summary, the method comprising a processor carrying out the steps of: 

assigning weights to topics in document data to be summarised; 

assigning weights to sentences in the document data; 

scoring each sentence in the document data by summing the assigned 

weights; 

selecting the sentence or sentences having the highest score; 
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relatively reducing the weight allocated to topics in the selected sentence or 

sentences; and 

repeating the scoring, selecting and topic weight adjusting steps until a 
certain number of sentences has been selected for the summary from the document data. 

67. (Withdrawn) A method of providing a summary of document data, 
which method comprises a processor carrying out the steps of: 

receiving data representing the topic or topics of the document data; 

locating, for words in the or each topic, words in or representative of the 
content of the document data that co-occur with those words; and 

outputting summary data in which the or each topic is associated with 
subsidiary items comprising located co-occurring words. 

68. (Withdrawn) A method of modifying chunks of sentences selected 
for a document data summary, which method comprises a processor carrying out the steps 
of: 

identifying chunks that do not contain words in topics representative of the 
content of the document data; 

modifying the identified chunks; and 

outputting the document data summary with the modified identified chunks 
of the selected sentences. 

69. (Withdrawn) Program instructions for programming a processor to 
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carry out a method in accordance with claim 62. 

70. (Withdrawn) A storage medium storing program instructions in 
accordance with claim 69. 

71 . (Withdrawn) A signal carrying program instructions in accordance 

with claim 69. 

72. (Cancelled) 

73. (New) A method to identify topics of document data, the method 
comprising the steps of: 

ranking words that are present in or representative of content of the 
document data; 

ranking co-occurrences of words that are present in or representative of the 
content of the document data; 

ranking phrases in the document data; 
selecting the words with a highest ranking; 

identifying which of the co-occurrences with a highest ranking contain at 
least one of the highest ranking words; 

identifying the phrases containing at least one word from the identified 
co-occurrences by concatenating consecutive nouns, concatenating consecutive proper 
nouns, and concatenating consecutive adjectives with a final noun; 
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selecting one or ones of the identified phrases with a highest ranking as the 
topic or topics of the document data; and 

outputting data relating to the selected topics. 

74. (New) A computer-executable program stored on a computer- 
readable storage medium, the program for identifying topics of document data, the program 
comprising code for performing the steps of: 

ranking words that are present in or representative of content of the 
document data; 

ranking co-occurrences of words that are present in or representative of the 
content of the document data; 

ranking phrases in the document data; 
selecting the words with a highest ranking; 

identifying which of the co-occurrences with a highest ranking contain at 
least one of the highest ranking words; 

identifying the phrases containing at least one word from the identified 
co-occurrences by concatenating consecutive nouns, concatenating consecutive proper 
nouns, and concatenating consecutive adjectives with a final noun; 

selecting one or ones of the identified phrases with a highest ranking as the 
topic or topics of the document data; and 

outputting data relating to the selected topics. 
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